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1. Introduction. 



In today's complex world an understanding of the impact of modelling 
assumptions upon optimum military strategies derived from mathematical 
models is essential for the determining of optimal solutions to complex 
problems of international significance. In this paper we continue the 
study of one of the authors on the effects of various modelling assumptions 
on the structure of optimal tactical allocation policies by systematically 
contrasting the solutions for a sequence of idealized models. These combat 
models are too simple to be taken literally but should be interpreted as 
indicating general principles to serve as hypotheses for subsequent higher 
resolution studies of real world problems via computer simulation or field 
experimentation . 

In previous papers [34], [33], [36], [37], [38] one of us has 
studied the optimal control of deterministic Lanchester attrition processes. 
A major result of this previous research was that optimal tactical alloca- 
tion policies are quite sensitive to the precise nature of the combat 
model adopted, even as to whether the tactical scenario lasts for a 
specified period of time or terminates only when a predetermined "break- 
point" has been reached. We have shown [36] that whether or not concentra- 
tion of all fire on a single enemy target type is always the optimal fire 
distribution policy depends on whether, for example, enemy target types 
undergo a "square-law" or "linear-law" attrition process (see also [38]). 

In the paper at hand, we examine the effects on the structure of the 
optimal fire distribution policy of whether combat attrition is modelled 
as a deterministic or a stochastic process. Although there has been a 
continuing discussion among military operations analysts about the relative 
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merits of deterministic versus stochastic combat attrition models (in 
particular, see [4], [9]), there apparently has been no systematic attempt 
to contrast optimal military strategies derived from such different 
modelling approaches. 

In order to keep the impact of modelling assumptions on optimal 
strategies in sharp focus and also for reasons of mathematical tractability , 
we consider a simple fire distribution problem for a homogeneous Y force 
in Lanchester combat against heterogeneous X forces composed of two 
types of weapon systems. Our research approach is to study the same 
scenario (prescribed duration battle) using a deterministic combat attri- 
tion model and also a stochastic one and then to compare the corresponding 
optimal fire distribution policies. 

The solution to the deterministic problem is obtained using modern 
optimal control theory (see [8], [27]). As discussed in [37] arid [41], 
the non-negativity restrictions on the force levels are state variable 
inequality constraints (henceforth abbreviated as SVIC's) and require 
special treatment (appropriate modification of the usual maximum principle^) 
when active (see Chapter 6 of [27], [40]). In this paper we shall treat 
SVIC's by the method of Speyer and Bryson [32] (see also [19], [24]) of 
adjoining an SVIC directly to the return functional with a (Lagrange) 
multiplier (see [41]). Unlike the corresponding terminal control problem 
studied in [34], however, this n solution M requires several computer assisted 
computations for implementation. 

The solution to the stochastic problem is obtained using the well- 
known dynamic programming approach to optimal stochastic control [13], [21], 

In this paper we employ an equivalent statement of the Pontryagin maximum 
principle [27] commonly used by engineers in the United States. There is a 
minor sign difference (see p. 108 of [8]) between these versions. 
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[12J. The basic equations of optimality (the fundamental functional 
equation for the optimal expected-value function (see [12])) are developed. 
We derive analytic solutions to these equations for very small numbers of 
combatants and thus obtain the optimal closed-loop control. As is the 
case for the Lanchester stochastic process (see [9], [20]) , a general 
solution for arbitrary numbers of combatants has not been obtained for 
the fundamental functional equation (actually a system of differential- 
difference equations) , although solutions for specific (small) numbers of 
combatants are readily obtained. Therefore, we have used finite-difference 
methods to generate a numerical approximate solution. 

The body of this paper is organized in the following fashion. First, 
we review a few relevant facts about the Lanchester stochastic process. 

Then we state the two optimal control problems that this paper compares. 

The method of solving the deterministic problem is outlined. The basic 
equations of optimality for the stochastic control problem are developed, 
and obtaining an analytic solution to these equations is discussed. The 
use of finite difference methods for generating a numerical solution is 
described. Then we compare results obtained from the two models and dis- 
cuss these results. The implications of these results for defense planners 
and military operations analysts are pointed out. 

2* The Lanchester Stochastic Process . 

In 1914 in the British journal Engineering F. W. Lanchester [23] 
postulated that under the conditio r of "modern warfare" combat between two 
homogeneous forces could be described by the equations^ 

^See [45] for a discussion of the assumptions inherent in (1). A further 
discussion of Lanches ter-type equations of warfare can be found in [39]. 
Further references on determinis Lanchester formulations can be found 
there [39] or in [11]. 
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dt = _bx> (1) 

where a,b are commonly referred to as the Lanchester attrition-rate 
coefficients and x(t) ,y(t) are force levels. During World War II, 

B, Koopman suggested a reformulation of such a model in stochastic form 
[25]. Subsequent work on stochastic models of combat attrition has been 
by R. Snow [31], R. Brown [6], [7], G. Weiss [44], D. Smith [30], and 
G. Clark [9]. The stochastic process corresponding to a model like (1) 
has been called the Lanchester stochastic process by B. Koopman [20]. 

Before considering the optimal stochastic control problem, it seems 
appropriate for us to review a few results for the Lanchester stochastic 
process. Consider combat between a homogeneous X force and a homogeneous 
Y force. Let us model this combat as a continuous parameter Markov chain 
with stationary transition probabilities (see pp. 188-189 of [26] for a 
further discussion of terminology) . Let M(t) denote the (integer) 
number of X combatants "alive" at time t after the battle begins, and 
let N(t) denote the number of Y combatants.^ We denote the state proba- 
bility by P(t,m,n), and thus 

P(t,m,n) = Prob[M(t) s= m,N(t) a =n] . 



Making standard assumptions (see [5]), we find that the state probabilities 
satisfy the following system of differential-difference equations 
for 1 £ m £ iHq and 1 £ n £ n^ 



II 

51 



Random variables are denoted by capital letters, while their realizations 
are denoted by the corresponding lower case letters. 

We adopt the convention that P(t,m,n) = 0 for either m > m^ or n > n^. 
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— (t,m,n) = P (t ,m+l ,n)A(m+l ,n) 4* P (t ,ra,n+l) B(ra,n+1) 

-(A(ra,n) + B(m,n) }P(t ,m,n) , (2) 



where (n^) is the number of X (Y) combatants at the beginning of 

battle at t * 0, i.e. M(t~0) = m^ with probability one; A(m,n) is 
the rate of attrition of the X forces with A(0,n) * 0; and B(m,n) 
is the rate of attrition of the Y forces with B(m,0) =0. In other 
words, we have 



Prob 



[ one X 
interval 



casualty in time 
from t to t + At 



A(m,n) At . 



(Moreover, P(t,m,n) is, more precisely, the transition probability 



P(t,ra,n) = P(t ,m,n;t=0,m 0 ,n 0 ) 



Prob 



|M(t)=m 

[N(t)=n 



M(t=0)=m71 

N(t=0)=n^J 



.) 



Of course, the state space is discrete, i.e. m = 0,1,..., m and 

n = 0,1 ,..., Hq. At state space boundaries, i.e. m = 0 or n = 0, 
equation (2) takes the form 



— (t,m,0) = P (t ,m+-l ,0)A(nr+-l ,0) + P(t,m,l)B(m,l) 

- P(t,m,0)A(m,0) , 

HP 

-j-£-(t ,0 ,n) = P(t,0,n+l)B(0,n+l) + P ( t , 1 ,n)A(l ,n) 

- P (t ,0 ,n) B(0 ,n) , 

Tjfu.O.O) = P (t ,1 ,0)A(1 ,0) + P(t,0,l)B(0,l) . (3) 



Initial conditions for (2) and (3) are 
P(t=0 ,m,n) = | 



1 for m * m^, n = n^ , 



0 otherwise. 



(4) 
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Let us adopt the following terminology for the attrition rates 
(and hence the process itself) . We say that we have a 
(a) linear-law attrition process when 

A(m,n) = amn, 

B(m,n) = bmn, (5) 

and (b) square-law attrition process when 

A(m,n) - 0m + an, 

B(m,n) *= bm + an, (6) 

where a ,6 may be referred to as operational loss rates. 

Although it is well-known that (2) through (4) yield an exponential 
solution (the Chapman-Kolmogorov equation expresses the semi-group property 
of the state probabilities (see [20])) when A(m,n) and B(m,n) have 
been specified (for example, by (6)), general solutions which apply for 
all values of m^ and n^ have only been obtained to this system only 
in a few special cases. In the special case when a + ot = b + 6, Isbell 
and Marlow [18] developed a general solution to (2) through (4) for a 
square-law stochastic attrition process. Recently, Clark (see pp. 102-104 
of [9]) developed the general solution to the linear-law stochastic 
attrition process (i.e. A(m,n) and B(m,n) are given by (5)). 

One reason why we have reviewed this material is to now point out 
to the reader that a general solution to (2) through (4) only exists for 
a linear-law attrition process and is very complex (see pp. 102-104 of [9]). 
In considering the optimal control of the Lanchester stochastic (square-law) 
process, we will encounter a similar system of equations for the optimal 
expected-value function. Keeping in mind that a general solution has not 
been obtained to the corresponding equations (2) through (4) for the state 
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probabilities of the square-law stochastic attrition process, the reader 
will not be surprised to learn that we have not developed an analytic solu- 
tion for the general case of these equations. 

Additionally, using the above noted solutions for the Lanchester 
stochastic process, Clark (following results in [25] and qualitative results 
in [31]) made comparisons [9] (see also Chapter 11 of [4]) of the average 
force levels in the stochastic process (denoted as m(t) and n(t)) and 
the corresponding force levels x(t) and y(t) in the deterministic 
formulation (such as (1)). Unlike the corresponding situation for the 
Yule-Ferry linear birth process (see pp. 77-78 of [3] or pp. 156-159 of 
[10]), there is a bias (due to ’’boundary effects") in the dynamical behavior 
of x(t) and y(t) as compared with m(t) and n(t) for the same values 
of a and b. It turns out that m(t) lies above x(t) , and the amount 
of separation grows over time. 

The above is a major result of Clark’s careful investigation in 
which several numerical examples are given to prove such points. He con- 
cludes that (see p. 11-19 of [4]) "the deterministic model would have 
difficulty approximating a stochastic simulation" with respect to the time 
history of force levels. Clark’s solution to the stochastic linear-law 
process was important in making such a comparison. This fact that the 
average of the Lanchester stochastic process does not behave identically 
to the corresponding force levels x(t) and y(t) computed according to 
the corresponding deterministic model has motivated the paper at hand. 
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3 . The Optimal Control Problems , 

In this section we state the two optimal control problems that are 
considered in this paper. The deterministic optimal control problem 
considered is 

maximize{ry (t f ) - px (t.) - qx 9 (t f )} with t specified, 

. / v r i t / i max 

< Pn^ t > 



r D 

subject to: 



'Vi y > 



dXj 

dt 
dx 2 

Tc a 'VV- 



dt 



- b l x l - b 2 x 2> 



(7) 



x i ,x 2 *y * °» 



0 i <J. D s: 1, 



and t„ s£ t 



max 



with initial conditions 



x 1 (t=0) = x 1> 



x 2 (t=0) = x 2> 



y(t=0) - y Q , 



where all symbols are explained in the Appendix. In this problem x^,x^, 
and y are called state variables, while <J> D is called a control (or 
decision) variable. A constraint such as x^ ^ 0 is called a state 
variable inequality constraint (SVIC) and requires special treatment (see 
below) . 



The battle lasts for 0 £ t £ t unless, of course 

max 

the other is annihilated before t . To be more precise, 

max 

terminates under one of the three following circumstances: 

(1) x £ <t £ ) - x 2 (t £ ) - 0 and t £ s t max> 

(2) y(t £ ) - 0 and t £ £ c^, 



(3) t, = t , 
f max 



, one side or 
the battle 
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where t^ denotes the time at which the battle ends. Upon further 

analysis, it has been convenient to consider that there are eight "terminal 

states," or "target sets." These are shown in Table I. The reader should 

note that for through Sg the battle ends by the system (as described 

by the three state variables x^.x^, and y) being driven to a prescribed 

terminal state. For these terminal states, t^ is undetermined when 

t. < t , since it is then determined by entry to the terminal state, 

and this depends upon the control used. For these cases a well-known 

transversality condition must hold. The above problem (7) is called a 

prescribed duration battle , since the battle lasts for a maximum duration 

of t , i.e. t, £ t 

max f max 

The corresponding stochastic optimal control problem considered is 



maximize E[rN(t^) - pM^(t^) - ‘^(t^)] with t^ specified, 

subject to: casualties occur randomly as a continuous 

parameter Harkov chain with stationary transition 
probabilities corresponding to the deterministic 
process (7), (8) 



M 1 ,M 2 ,N :> 0 and 0 S * s S 1, 

where the random variables M^(t), M 2 (t), N(t) are force levels 

(integers), E [ * ] denotes mathematical expectation, and all other symbols 
are explained in the Appendix. In (8) <J><, = ^(t.m^.n^.n) denotes a 

closed-loop control (see [16]). For the deterministic problem (7) we 
have not been precise about this point, since it is well-known that open- 
loop control (e.g. (t;x°,x°,yQ)) and closed-loop control 

(e.g. = k(t ,x^,x 2 ,y) ) are equivalent and yield identical results in 

trajectory and payoff [16]. For stochastic control problems this equiva- 
lence is, of course, not true (see [12]). 
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Table I. Definition of Terminal States for Deterministic 
Optimal Control Problem (Prescribed Duration 
Battle) . 



V' x l (t £> > °- x 2 (t £ ) ’ °* y(t £> > °’ C f ‘max 



S 2 : W ’ x l<‘l> = °- x 2 (t £ > > °> y(t £ ) > °* ‘f ’ ‘max 

where 

S 3 : X l (t f } = X l (t 3 } " °* X 2 (t f ) = °’ y(t f } " °» t f = t max 

where t^ < t^ 



S 4 : x 1 (t f ) > 0, x 2 (t f ) > 0, y(t f ) = 0, t f * t max 



S 5 : X i^ t f^ = x i^ t i^ = 0> x 2^ c f^ > °* = °» t f & t 



'f max 

where 



S 6 : x i^f^ = x i ( t 2 ^ > °» x 0 (t f ) = 0, y(t f ) =0, t f s: t 



2 V £' 



"f max 



where 



S 7 : X i^f^ = X i^ c i^ = 0> x o( c f) = °> y(t f ) >o, t f z t 



2 ' r 



'f max 
where t^ < t^ 



S Q : x.. ( t £ ) = 0, x-CO = x. (t.) = 0, y(t,) >0, t, s: t 



8 ‘ r £' * 2 f 2 4 



f max 

where t. < t £ 
4 f 
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4 . Determination of an Optimal Policy for Deteminis tic Problem . 

In this section we outline how an optimal policy (expressed as a 
closed-loop control) may be determined for (7) . In order to keep the 
length of the paper at hand within reasonable limits we will only be able 
to highlight the main points. Details which are available elsewhere in 
the open literature will be omitted. In order to contain the length of 
this paper the entire ‘‘solution" will not be given here.^ 

4.1. Outline of Solution Procedure . 

Before giving our solution algorithm, it seems appropriate to define 
some terms . We have then 

Definition 1: By an extremal path we mean a path on which the necessary 

conditions of optimality are everywhere satisfied (we use 
the work everywhere » since we take the class of admissible 
controls to be the space of piecewise-continuous functions) . 
Definition 2: By an extremal control we mean the control used in order 

that the system follow an extremal path. 

Definition 3: By the domain of controllability for extremals to a given 

terminal state we mean that subset of the initial state 
space from which extremals lead to the terminal state. 
Definition 4: By the synthesis of an extremal control we mean using the 

basic necessary conditions of optimality to explicitly 
determine the time history of an extremal control from 
initial to terminal time as a function of initial conditions. 



Complete results in a form suitable for numerical determination are to be 
found in Appendix G of [43]. The "solution" occupies twenty pages in [43], 
and this should explain why for the purposes of the paper at hand only 
representative results are given. 
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Oar solution algorithm then is as follows: 

(a) an extremal control law is developed from the maximum principle 
(which must be modified when the trajectory lies on the boundary of 
the state space) ; for Lanchester "square-law" attrition structures 
the extremal control law in many cases depends only on relationships 
between dual variables (marginal returns from destroying targets) , 

(b) for each terminal state an extremal control is synthesized by com- 
bining a backwards integration of the adjoint system of differential 
equations with the extremal control law and corner conditions, 

(c) for each terminal state the domain of controllability for extremals 
is determined by forwards integration of the state equations using 
the synthesized extremal control from (b) , 

(d) the solution is determined at this point for regions of the initial 
state space which are covered by only (part of) the domain of con- 
trollability for extremals to one terminal state; one must also verify 
that the entire initial state space has been accounted for, since 
otherwise one may have overlooked some type of "singular" surface, 

(e) if domains of controllability overlap so that for a point of the 
initial state space contained in their intersection there is more 
than one extremal leading to the terminal surface, then one computes 
the return (or payoff) associated with each extremal; the optimal 
trajectory is selected from the extremals by comparing these values. 

The above solution algox \tnm is a refinement of the one presented 
in [34]. Let us make a few remarks about the application of this procedure 
to the prescrived duration battle ? } For this problem we may think of 

For this approach to work it is essential that an optimal policy exist for 
(7). This has previously been established in [37], [41]. In this case 
one of the extremals must be ar optimal trajectory. 
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time as being an additional state variable. On the other hand, for the 

Isbell-Marlow terminal control problem [34] time may be considered as being 

a parameter and consequently was eliminated for the determinations of step 

(c) above. In other words, for the Isbell-Marlow problem a domain of 

controllability was determined by inequalities involving the three state 

variables; for the prescribed duration battle (7) such a determination 

involves the four variables t , x°, x°, and y~. 

max 1 2 J 0 

For the prescribed duration battle we have not been able in all 

cases to develop analytic expressions at step (c) in the above algorithm 

as we did for the terminal control problem studied in [34]. Consequently, 

we could not analytically accomplish steps (d) and (e) for the problem at 

hand. We have, however, used computational methods to determine the optimal 

control. We have expressed our "solution" (partially presented in the next 

section) so that given a point P° = ( x ^> X 2»yQ^ t * ie state space 

and t , one can determine which terminal states are reached by extremals, 
max 

Thus, we can determine to which domains of controllability P° belongs. 

Then, using the extremal control, we can numerically compute the return 
(or payoff) associated with each extremal and select the optimal policy 
from among a finite number of possibilities. A computer program was written 
in FORTRAN to do the above and computations performed on an IBM 360 computer. 
4.2. Summary of Solution . 

We have applied the solution procedure of Section 4.1 to develop 
a "solution" in the sense discussed there. Without loss of generality we 
assume that a^b^ > a 2^2* R > 1* There are two cases to be considered 

( 1 ) 6 ^ 1 , 

and (2) 0 s { < 1, 

where 6 = a^p/ (a 2 q) . 
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For Case (1): <$ ^ 1 , the domains of controllability do not overlap 
each other , and hence extremals extremals are unique. The extremal control 
is thus the optimal control i The optimal policy, moreover, may be expressed 
in a particularly simple form: always concentrate all fire on while 

x^ > 0. Further details on domains of controllability and "event 11 times 
are to be found in Table II of [43] . 

For Case (2) : 0 £ 6 < 1, some domains of controllability overlap 

each other, and hence extremals are not unique (in the sense that from a 
point in the initial state space the system may be steered along any one 
of several extremals to various end states of battle). (See [41] for a 
discussion of a similar case.) Thus, considerations "in the large" (i.e. 
step (e) of the above solution procedure) are required to determine the 
optimal policy. Unfortunately, explicit analytic expressions are not 
readily obtainable as they were for the Isbell-Marlow terminal control 
problem [34]. However, as discussed in Section 4.1 above, one can use the 
information presented in Tables III of [43] (which is fifteen pages long) 
to numerically determine an optimal fire distribution policy for any specific 
set of model input parameter values. A representative sample of this informa 
tion is given in Table II. 

In Case (2) the optimal fire distribution policy cannot be expressed 

in the very simple form as in the first case. When Y wins in time less 

than t (S_, for which the optimal policy is determined) , the optimal 

max 7 

fire distribution policy is precisely the same as when <$ ^ 1. However, for 

all other cases (i.e. terminal states S- through S.) the extremal policy 

1 o 

is to finish the prescribed durp'*ion battle by firing at , regardless of 
whether or not X^ has been annihilated. This differs from that when 6 1 

Thus, we see that force levels affect the optimal fire distribution policy. 
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Table II. A Representative Part of the Solution 
to the Prescribed Duration Battle for 
0 * 6 < 1 . 

(Nonrestrlctlve assumption: R > 1 , i.e. a^b^ > a 2 ^ 2 ^ 



V x l (t f> “ X x (t l > ■ °* x 2 (t £ > > °* y(t f> ’ °* c f * ‘max 



Extremal Control: $ D (t) 



1 for 0 * t jC t^ where x^Ct^) * 0 



0 for tj < t < tj 



Domain of Controllability: a^b^y 2 > ® 2 “ ^ b 2 X 2^ 2 

a l b l y O < ® 2 + (R-l)(b 2 x °) 2 



^VZ- 



t « t. 4- tanh \ 

£ 1 X 



lj /a l b l y O - •“ + (b 2 X 2> 



b 2 x°/R 



'■) 



* t 



where t^ - t^S^ - t^S^ is 8 iven by 



( 1 ) for a^y 2 > s 2 



. * n pV o 

1 ^7 ' 



■'a.b.y* - s* + (b-x °) 2 - b„x' 



2 2 2 2 



^ y Q - s 



( 2 ) for a^y* < 8 2 



1 t ( b 2 X 2 ~ /a l b l y 0 ~ 6 " + (b 2 X 2 
•^l 5 ! 8 ‘ ^ y 0 



(3) for a^y* " ® 2 

t- - - • ln| - - - - g | 

1 4^7 ' b 2V 



NOTE: for 0 £ <5 < r - /R(R-l) optimal paths also satisfy (equality 

yielding a dispersal surface ) 
for OixJ < (b 2 x 2 )/(kb 1 ) 



a l b l y O * Ra2 



1? 



U ° [z 2 (R-1) ± Rl + h °l 2 

IV' in + b 2 x 2 ( * 



where k is given by k - (z 2 - R(z-l) 2 }/ ( 2 R) 
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4.3. Development of Basic Necessary Conditions of Optimality . 

We will use Speyer and Bryson’s approach [32]^ of adjoining the 
state variable constraint directly to the criterion functional with a 

Lagrange multiplier. The Hamiltonian is given by (see also [19]) 

H(t,x,p, 4 > D ) = " p 2 ( 1 ^D )a 2 y “ P 3 (b 1 x 1 +b 2 x 2 ) + n 1 (t)x 1 + n 2 (t)x 2> ( 9 ) 

where 



n. (t) 

o. 



= 0 for x. > 0, 
i 



£ 0 for x^ = 0. 



The adjoint system of differential equations for the dual variables is 



dp l _ 

dt 


911 /* x*\ 

- 


= b 1 p ;} - 1^(0 , 


(10) 


dp 2 

dt 


9H t, 

- 3x 2 (t >i.-£>V 


= b 2 p 3 - n 2 (t). 


(ID 


dp 3 _ 


3H, * 


* * 


(12) 


dt 




* Vl P l + (1 ^D )a 2 P 2‘ 



Boundary conditions for the dual variables (also frequently called trans- 
versality conditions) are discussed below. When t^ < t the following 

transversality condition also holds 





H(t=t -,x,p,<j> ) = 0. 
X ~ u 


(13) 


When x i > x 2 > 


0, the maximum principle yields 


the extremal control 


law [34], [41] 


/ 1 for v(t) > 0, 






V c) = ) 

' 0 for v(t) < 0, 


(14) 


^Taylor apparently is 
variational problems 


the only person to apply these important results to 
in operations research. See [41] for discussion of 



previous applications . 
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where v(t) = (-p^)a^ “ ^~^2^ a 2‘ * n We s ^ owed t ^ iat there are no 

singular subarcs (see Chapter 8 in [8]) in the solution. 

Without loss of generality, let us consider a constrained subarc 
on which Xj(t) = 0 for £ t £ (and x^jy > 0 for t < t^) . Since 

dx l * 

= 0, the control is clearly ^(t) = 0 for £ t £ t^. The require- 
9 H 

ment that — * 0 yields the following relationship between dual variables 
09 

on the constrained subarc^ 



a l p l(t) - a 2 p 2 (t) . 



(15) 



The multiplier n^(t) 
and this yields 



is determined from the condition that 



*-[**) . 0 

dt^(J) J 



Po(t) 

Hi(t) c — (a 1 b 1 ~a 2 b 2 ) . (16) 

The interpretation of rj^(t) (see [41] for a further discussion) is the 
rate of marginal return to Y for keeping =* 0. Thus, (intuitively) 

Y tries to annihilate only when it profits him to do so. Further- 

more, the requirement that n^(t) ^ 0 when x^ * 0 for a finite interval 
of time yields that we must have 



a l b l ^ a 2 b 2* ( 17 ) 

since it may be shown that p^(t) > 0 for t < t^. The nonrestr ic tive 
assumption that a^b^ > a 2 b 2 (^* e# R > 1) implies that it is nonoptimal 
to have x 2 = 0 for a finite interval of time. 

Furthermore, when the necessary conditions of optimality are expressed 
in Speyer and Bryson’s format [32] (see also [19]), the corner conditions 



The development of (15) requires a slightly different argument when t = t f 
and y(t f ) = 0. See [41] for a further discussion of this point. 
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(see pp. 125-126 of [8]) take a particularly simple form for a first order 
SVIC: the adjoint variables are continuous across all corners (both 

interior to and on the boundary of the state space) . In other words 

P(t j = p(t+) , (18) 

where denotes the time just before the corner (i.e. a left-hand limit). 

We also have that 

H(t c^ (t c )> L (t c ) ’ 4 , D (,: c )) = H(t c^ (t c ) ’£ (t t ) ’ 4 > D (t t )) ' (19) 

On entry to a constrained subarc with x^(t) = 0 for t^ & t £ t^, (19) 

yields 

a l p l(t^) = a iPi^ c i) = a 2 V 2 ( ' t l ) “ a 2 p 2^ t l^* ( 2 °) 

Let us finally consider the boundary conditions for the dual 

variables at t = t^. The nonrestrictive assumption that a^b^ > 

yields that no extremals lead to Sg . The three terminal states S^, 

and may be discussed collectively. In all three cases the length of 

the battle is equal to t . Then, according to the results presented 
n max 

in [42] , we have 

for S^, S 2 , and S^: 

p 1 (t f ) = -p + p 2 (t f ) = -q + v 2> p 3 (t f ) m r > o* ( 21 > 



where 
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= 0 for x^(t^) * 

| ^ 0 for 0 but x^(t) > 0 for t < t^, 

unrestricted for x_,(t r ) « 0 and x.(t) = 0 

if i 

for t. £ t £ t c with t, < t-. 
i f if 



( 22 ) 



The latter condition that, for example, the multiplier is unrestricted 

when the system is on a constrained subarc for a finite interval of time 

is because the boundary of the state space is "absorbing" (i.e. the state 

constraint x^ ^ 0 essentially acts like a terminal equality constraint 

as far as the determination of boundary conditions for the adjoint variables 

[42]). If there were replacements in the model (7) so that the boundary 

of the state space would not be !, absorbing , M then we would have ^ 0 

for x . (t r ) = 0 . 
l f 

For S, , S_, and S, the duration of the battle t^ is determined 

4 j o t 

by the terminal equality constraint y(t r ) * 0 when t c < t so that 

n J J f f max 

the transversality condition (13) yields p^(t-) = 0. When t^ = t ma x > 

additional analysis is required, and this is discussed in Section 4.4 

below. Then, again according to the results presented in [42], we have 

for S, , S. , and S , : 

4 5 6 

p 1 (t f ) = -p + P2 ( c f ) = -q + \> 2 , P 3 (t f ) = °» ( 23 ) 

where the multipliers for i = 1,2 are again given by (22). 

For S ? : x^t^ = x 2 (t f ) = 0, y(t f ) >0, t f £ t^, we have [8] 

P 1 (t f ) = -p + v 1# P 2 (tf ) * -q + v 2 , P 3 (t f ) = r > 0, (24) 

since t^ is determined by the (equality) terminal constraints x^(t^) = 0 
and x^(t^) = 0. Since these are equality constraints, the multipliers 
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and are unrestricted in sign. Since t^ is unspecified, the 

* 

transversality condition (13) with (^(t^) c 0 yields that = ^ 

so that p 2 (t£) ~ 0 and \> 2 =■ q. The con( Htion (15) which, in particular, 
holds at t = tf yields that p^(t^) “ 0* Thus, we have 

for s 7 [x 1 (t f ) 0 before x 2 (t f ) * 0, y(t f ) 0]: 

PfCtf) * 0, P 2 ( tf > * °> p 3 ( t f ) = r * (25) 

4.4. Synthesis of Extremal Control . 

For each terminal state, extremals may be synthesized by combining 
the conditions which must hold on a constrained subarc and the extremal 
control law (14) with a backwards integration of the adjoint equations (10), 
(11) and (12) . The boundary conditions for the adjoint variables given 
in Section 4.3 and the corner conditions (18) and (19) are used in this 
backwards sweep process. It is convenient to use the switching function 
v(t) = (-p^)a^ ~ ^~ p 2^ a 2 in s y nthesizin § extremals. Using (10) and (11), 
we readily find that for t < t^ 

= P 3 (t) ( -a 1 b 1 +a 2 b 2 ) < 0, (26) 

since p^(t) > 0 for t < t^ . 

Details in the synthesis of extremals are similar to those presented 
in [ 34 ] — [ 38 ] , [41], and [43] and hence they are omitted. The treatment 
in [37] is most similar to the problem at hand. Details for <5^1 and 
for 0 £ 6 < 1 are different. 

There are two interesting aspects, moreover, that we encountered 
in synthesizing extremals. These are 

In some of these references the non-negativity of the force levels (i.e. 
SVIC f s) have been treated by means othez than Speyer and Bryson's approach 
[8]. The basic principles of working backwards from the end, however, are 
the same in all applications. 
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(a) when 0 £ 6 < 1 and a switch in the target type upon which all Y- 
fire is concentrated occurs without the annihilation of a target type, 
the switching time depends upon the initial force levels and possibly 
the valuation of Y survivors, and 

(b) when P° =* (x°,x°,yQ) suc ^ that when 6 < 1 an extremal leads 

S S 

to (i.e. we reach with a switch in tactics) with t^(S^) 

< t ma ^, we can possibly also steer the system to an end point with 

y(t^=t max ) * 0 without violating any necessary conditions of optimality. 

Let us first discuss the dependence of the non-annihilation switching 

time on force levels and valuation of Y survivors. Such a switch in 

fire distribution only happens for 6 < 1. Let us compare the situations 

for extremals leading to and S^. In both cases we have 

{ 1 for 0 £ t £ t f - x^, 

0 for t^ - < t £ , (27) 

where x^(t=tf-tf) > 0. It is convenient to introduce the "backwards time" 

* 

t defined by x = - t. Then when <$ < 1 , we have <{> D (x) = 0 for 

0 £ x £ where x^ denotes the backwards time of the first switch in 

fire distribution. For S^[x 1 (t f ) > 0, x 2 (t f ) > 0, y(t f ) =0, t f < t max ] , 
it may be shown using (10)-(12), (14), (23), and (26) that^ 



x,(S,) = — - — cosh 1 z, 



(28) 



where z = (R-6)/(R-l). For SftXfCtf) > 0, * 2 (t f ) > 0, y(t f ) = 0, 



t^ = t ] , it may be shown that 
f max 



Further details of the results summarized in this section are to be found 
in [43]. To keep the paper at hand from being too long, we have omitted 
them . 
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where 






1 




„ ( z+Zz^+a^-l^ 

An I — — 1 

^ 1 + a ' 




( 29 ) 



(30) 



The following theorem is of interest (see [36] for a similar result). 
THEOREM 1: Assume that R > 1 and 6 < 1. 

Then, 

T 1 (S 1 } < t 1 (S 4 ) * 



A proof of Theorem 1 is given in [43] . Furthermore, it is readily shown 

that lim t^(S^) = 0* Thus, when 6 < 1, the switching time t « t- - 
r-H-o° 

along extremals leading to explicitly depends on the value 

Y places upon the survival of his own forces. The higher he values Y- 
force survivors, the longer Y forces concentrate their fire on when 

6 < 1. For extremals leading to S^, the transversality condition (13) 
yields that Y-force survivors have zero value. Intuitively, we see that 
firing longer at prolongs the length of battle for those cases when 

y(t^) = 0, since a^b^ > a 2^2‘ However, for extremals leading to 
this is not an optimal tactic. 

Let us therefore consider the case when t,. = t for S. . We 

f max 4 

just discussed above the possibility when R > 1 > 6 of prolonging the 
length of battle along an extremal leading to by firing longer at 

X^ . Using (27), it may be shown that 



y(t f ) = y(t f -T 1 )cosh/a 2 b 2 



(b 1 x 1 (t f -T 1 )+b 2 x 2 ) 



sinh/a 2 b 2 



(31) 



23 



where 



and 



_ _ _ _ 1 „ rz+^z^+a^-l'v 

T i - T i <a) - 7= tn ( — m— ) ■ 

/a 2 b 2 



(32) 



a _ (rfv) j2 

q * a 2 



(33) 



where v is the multiplier corresponding to the terminal constraint 
y(t f ) = 0. Then, the following lemma may be established [43]. 

LEMMA 1: Consider an extremal leading to with y(t^) 

given by (31) and t f defined by y(t^) * 0. Then 



3y(t f ) 



< 0 if and only if < * 



In [43] it is whown that by increasing the implicit valuation of Y 

survivors (i.e. v in (33)) the length of battle may be extended until 

tf = t^ . However, this is not an optimal policy. This situation in 

which a special case (here t_ = t for S.) requires an inordinate 

f max 4 

amount of analysis unfortunately has arisen in all problems that we have 
studied . 

4.5. Obtaining an Optimal Policy . 

After extremals have been synthesized, domains of controllability 
for extremals may be obtained as shown in [34]. It then remains to apply 
steps (d) and (e) of the solution procedure given in Section 4.1. A 
computer program written in FORTRAN has been developed to assist in the 
determination of an optimal policy. This computer program does the follow- 
ing: for a given point in the initial state space, we determine to which 

terminal states extremals lead. Then, the payoff corresponding to each 
extremal is computed. The optimal path (and hence the optimal policy) is 
readily obtained by determining which extremal yields the largest return 



to Y. 
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In the above fashion, the optimal fire distribution policy may be 
obtained as an open-loop control. After this has been obtained, it is a 
straightforward matter to express the optimal policy as a closed-loop 
control. In doing this, it is convenient to cite the principle of optimality 
[1] (a special case of Isaacs* tenet of transition [17] (see also [2])), 
i.e. every subarc of an optimal trajectory is itself an optimal trajectory. 



5 . Determination of an Optimal Policy for Stochastic Problem . 

In this section we discuss how an optimal fire distribution policy 
(expressed as a closed-loop control) may be determined for (8) . Using 
the formalism of dynamic programming, we develop the fundamental functional 
equation for the optimal expected value function. This is a sufficient 
condition of optimality: a control which leads to the satisfying of this 

equation is an optimal policy (see [29]). An analytic solution is developed 
to the fundamental functional equation for very small numbers of combatants. 
Finite difference methods are used, however, to generate a numerical 
approximate solution, since a general solution (for arbitrary numbers of 
combatants) has not been obtained to the fundamental functional equation. 

5 . 1 Development of Fundamental Functional Equation . 

Let S(T,m^,m^,n) denote the optimal expected-value function (see 
[12]). Then 



S(x,m ,m ,n) = maximum E [rN(x=0) - pM (x=0) - qM (x=0) ] , (34) 

1 ^ a /z m , x l ^ 

<P S t $ ~ 

where 



the system state is m^rn^n at time x (i.e. 

$ is the class of admissible controls (i.e. <J>g 
chosen from the set o f rational numbers (0, 



M 1 (x) = m 1 , etc) 
must always be 



1 2 



1 », 



y 



n(x) ’n(x) * ' ’ ’ ’ 
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T = t^ - t is the "backwards time" from the end of battle (which 
begins at t = 0) , 

^ denotes mathematical expectation given that m(x) = (m^(x), 
m 2 (T) ,n(x) ) , 

casualties occur in a random fashion between t and t^. 

In other words, S (t ,m^ , 11 ^ ,n) is the maximum return that we get on the 

average when we start with force levels and n at t = t^ - t , 

★ 

follow an optimal policy <j>g (s ,m^ ,n) (chosen from the class of 

admissible policies <$) for t £ s £ t^, and casualties occur in a 
random fashion. 

We consider that casualties occur as a Markov process with discrete 
state space (or discontinuous Markov process). Specifically, we assume 
that 



(1) the attrition process is a continuous parameter Markov chain with 

stationary transition probabilities corresponding to a deterministic 
Lanchester square-law attrition process; this is equivalent to 
assuming 

(a) the future occurrences of casualties depend only on the state 
of the system at t and not on past history, 

(b) the transition probabilities depend on only the state of the 
system, 



(c) , lone X casualty! . A . 

Prob . 4 1 . . ' = <J>a. nAt , 

Lin interval At J 1 



Prob 

„ , [one 

Prob [in i 



one casualty | 

in interval At 



(l-$)a 2 nAt, 



Y casualty 
interval At 



* <b i n i 



+b 2 m 2 )At, 



where <J>a^n is X^ casualty rate, etc.. 



26 



(d) 



Prob 



more than one casualty] 
in interval At 



[" 



0((At) 2 ), 



where 0(x) 



denotes dependence on x such that 



lira 

x+0 



0(x) 

x 



const . , 

(2) the Y-forces have perfect information as to the state of the system 
at t and the expected casualty rates, 

(3) the Y-forces can instantaneously shift fire from any target at any 
time, 

(4) the length of the battle is known. 

Then, we have 



state variables : M^(t) ^^(t) » 

decision (or control) variable: <{>„, 



where 



± ^ - Ik 1 2 n(t) -1 ^ i 

*S 6 * *n( t) *n (t) * ’ ’ ’ * n(t) 



To be more precise (t ,m^ jm^ ,n) is a closed-loop (or feedback) 

control . 

To develop the fundamental functional equation for the optimal 
expected -value function, we begin by considering any interval of n backwards 
time" of length At which occurs from t - tA to x. There are five 
exhaustive and mutually exclusive possibilities for random events to occur 
in such an interval. These are 

(1) one casualty occurs, 

(2) one casualty occurs, 



(3) one Y casualty occurs, 

(4) no casualty occurs, 

(5) more than one casualty occurs. 
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Let us now examine each of these cases and develop expected returns. 

(1) One casualty occurs in At : 

By our assumptions above, we have for the probability of occurrence 
of this event 



Prob[one casualty occurs in At] = (Jja^nAx. 

Given that one casualty is realized in the interval from x to 

x - Ax, the optimal fire distribution policy for Y will consider the 
maximum expected value for the return functional as casualties continue 
to occur randomly from x - Ax to x *= 0. This maximum expected value 
is S (x-Ax ,m^ (x-Ax) ,m^(x-Ax) ,n(x-Ax) ) where m^Cx-Ax) =m^(x)-l, 

(t -At) = (x) , and n(x-Ax) = n(x). 

(2) One X 2 casualty occurs in Ax : 

Similarly, we have that 

Prob[one X 2 casualty occurs in Ax] = (l-<f>)a 2 nAx , 

with the optimal expected-value function S (x-Ax ,m^ (x) (x) -1 ,n(x) ) . 

Events (3) through (5) are analyzed in a similar fashion. 

Now, by the standard dynamic programming argument which combines 
the probabilities of events (1) through (5) above with the maximum expected 
return to be achievable given these events occur, we obtain the expression 

S (x ,m ,ni 2 ,n) = maximum{ [l-Ax{<f> s a^n+(l-<J> s ) ) ] S (x-Ax ,m^ ,n) 




+4 ) s a 1 nAxS (x-Ax ,m^-l ,^2 ,n) + (l-<f> s )a 2 nAxS (x-Ax ,m^ ^-l ,n) 



+ (,b 1 m 1 -f*b2m2)AxS(x-Ax,m 1 ,m2 > n-l) } . (35) 
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Rearranging terms in (35) and taking the limit as At -* 0, we 
obtain the fundamental functional equation for the optimal expected-value 
function 

for m^m^n > 0: 

dS 

^-(x,m 1 ,m 2> n) = (b^+b^) (S (x .m^ ,m 2 ,n-l) - S(x,m ,m ,n) 

+ n maximum [$ a {S (x ,ra -l,ra ,n) - S(x,m ,m ,n)} 
b 1 12 12 

♦s* # 

+ (1— ) a 2 S(x,m 1 ,m 2 -l,n) - S (x ,m 2 ,n) }] , (36) 

with the boundary condition at t = t^ 

S(x=0,m 1 ,m 2 ,n) = rn - pn^ - qm 2 , (37) 



where m^,m 2 , and n are integers and 



4 > 



to. 



1 2 
n’n’ 



(n-1) 
’ n 



, 1 }. 



(38) 



Special forms of (36) in which m^ = 0, etc., will be given later. 

More concisely, we could have said that (36) results from combina- 
tion of the well-known formalism of dynamic programming with the retrospective 
(backward) probabilistic evolution of the system over time (c.f. [13], [22]). 
It should be noted that (36) is a special case of an equation given by 
Kushner in 1962 [21]. 

If we take (36) to be the basic equation for S(x,m ,m ,n) , then 
(35) may be considered to be the simplest finite difference approximation 
to it, i.e. the result of applying the well-known Euler* s method to (36) 

(see pp. 130-131 of [15]). (Of course, a method employing a higher order 
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approximation scheme (see pp. 132-140 of [15]) may be necessary under many 
circumstances.) We will find this point of view convenient when we consider 
developing a solution to (36) . 

Alternatively, we could have taken a discrete parameter Markov 
chain as our basic combat model. It is readily shown that an optimal 
policy exists for this latter formulation (see Theorem 1 on pp. 88-89 of 
[22]), and that a policy which yields the maximum in (35) is an optimal 
policy (see Theorem 2 on p. 89 of [22]). 

5.2. On the Analytic Solution of the Fundamental Functional Equation . 

The first task in determining an optimal fire distribution policy 
(which requires obtaining the solution to (36) and (37) is to develop the 
entire system of equations (c.f. equations (2) through (4)). We must, 
therefore, develop the form that (36) takes at the boundary of the system, 
i.e. m^ = 0 or m^ = 0 or n “ 0, where the fire distribution problem 
no longer exists. When n = 0 , arguments similar to the above lead to 
for n = 0, m ^ 0, m^ £ 0, 

d S 

— (t.m ,m ,0) = 0 with S (t=0 ,m ,0) = -n^p-n^q, 

and hence 

for nsO.m^O.m^O: s(t ,0) = -n^p - m 2 q. 

Similarly , 

for m^=0 ,m 2 =0 ,n^0 : S(t,0,0,n) = nr, 

dS 

for m^=0 ,m 2 >0 ,n>0 : — (T.O.m^n) = b 2 m 2 (S (x ,0 ,m 2 ,n-l) 

- S(T,0,m 2> n)} + a 2 n{S ( t , 0 ,ra 2 -l ,n) - S (t , 0 ,m 2 ,n) } , 



(39) 



(40) 



(41) 
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for m^>0 » m 2 = 0 «n>0 : — (x.m^.O.n) = (x ,m^ ,0 ,n-l) 

- S(x,m^,0,n)} + a 1 n{S(T,m 1 -l,0,n) - S (x , 1 ^ ,0 ,n) } . (42) 

Equations (36) through (42) are the complete system of equations for the 
optimal expected-value function in the optimal control of the Lanchester 
stochastic process. 

For m^ > 0, > 0, n > 0 the optimal fire distribution 

policy is determined by the maximization operation in (34) , and hence 

! 1 for W(x,m ,m 9 ,n) > 0, 

1 z 

0 for W^.m^.n^.n) < 0, (43) 

where we shall refer to W(x ,m^ ,n) as the "switching function." It is 

defined by 

for m^ > 0, m 2 > 0, n > 0, 

W(x ,m^ ,m 2 ,n) = a^S (x .n^-l ,m 2 ,n) - S (x .n^ .n^ ,n) } 

- a 2 {S(x,m 1> m 2 -l,n) - S (x ,m 2 ,n) } . (44) 



Let us observe that at the end of the battle at t = t^, we may combine 
(37) , (43) , and (44) to obtain 



! 1 for a x p > a 2 q, 

0 for a^p < a 2 q, (45) 

which is similar to results for the optimal control of the deterministic 
process (7) (see, for example, (14), (21), and (22)). 

It should be noted that equations (36) through (42) have the same 
form as those for the Lanchester square-law attrition stochastic process 
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(i.e. equations (2) through (4) when the attrition rates are given by (6)).^ 
A general solution has not been obtained to these equations. Nevertheless, 
it is of value to develop a partial solution. For example, since we use 
finite difference methods to generate an approximate solution (see Section 
5.3 below), it is desirable to check the adequacy of the approximation (in 
particular, the "time step size" used in the numerical propagation of the 
approximate solution by "marching ahead in time"). This is easily done by 
comparing the approximate solution, denoted as S, to the exact analytic 
solution, denoted as S. Hence, a partial analytic solution is useful. 

Careful consideration of (36) through (42) reveals that there are 
restrictions on the order in which the optimal expected-value functions 
S(x,m^,m 2 ,n) for m^ = 0,1,2,.,., etc., can be computed. In particular 
an admissible sequence for building up the solution through S(x, 1,1,1) 
is shown below in Table III. 

m 1 m 2 n 

0 0 0 

10 0 

0 10 

0 0 1 

110 
Oil 
10 1 

111 

Table III. 

Admissible Order for Computing Optimal Expected-Value 
Functions (admissible order is from top to bottom) . 



We note that (36) becomes a first order system of ordinary differential 
equations for S(x,m^,m 2 ,n) when <f>* as determined by (43) is used. Solving 
for S(x,m^,m 2 ,n) for m^ = 0,1,2,..., etc., we can then determine 4> by 
(43) . The synthesis of an optimal control by combination of the control law 
(43) with integration of a system of differential equations is similar to 
that for deterministic optimal control problems. 
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We readily successively compute using (39) through (42) 

S(t, 0,0,0) = 0, S(x, 1,0,0) = -p, S(t, 0,1,0) = -q, 
S(t, 0,0,1) = r, S(t, 1,1,0) = -p - q, 



S(x, 0,1,1) 



'b^r-a 

a 2 + k 




-( a 2+b 2 )T 

e 



+ 



a 2 r “ b 2 q 



a 2 +b 2 



) 



S(x, 1,0,1) = 



' b l r ~ a l p - 


- <a l +b l )T f a 1 r-b 1 p 1 


a 1 +b 1 J 


6 l a 1 +b 1 J 



(46) 



Using (46), equations (36) and (37) become for m^ = 1, m^ = 1, n = 1, 



dS 



^-(t, 1,1,1) * ~(b 1 +b 2 ) (S (x ,l,l,l) + (p+q) } + maximum [<J> s a 1 (S (x ,0,1,1 )-S(t ,1 ,1,1) } 



4> s =0 or 1 



+ (l-<j> s )a 2 {S(T,l,0,l)-S(T, 1,1,1)}], (47) 



with 

S(x=0, 1,1,1) = r - p - q, 

where S(t, 0,1,1) and S(t, 1,0,1) are given by (46). 

Using (43), (44), and (45), we may readily solve (47). As for the 
deterministic formulation, there are two cases that must be distinguished 

Case (1) a 1 p £ a 2 q, 

Case (2) a^p < a 2 q. 

For Case (1): ^ a 2 q, we have that <|> (x , 1 ,1 ,1) = 1 for 0 £ T £ 

where denotes the "backwards time" of the first switch in the optimal 

fire distribution policy. Thus is the smallest t which satisfies 

W(t=Tj ,] , 1,1) - 0 with W(x, 1,1,1) given by (44). 
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for 0 £ x 2 ; t 1 when a.^ £ a 2 q (4> s (x,l,l,l) = 1) 



, . .. a l (b 2 r ' a 2 q) -<a 2 +b 2 )T ([(b -a 2 )(b +b 2 )+a 1 b 1 ]r 

S (t ,1,1,1) = 7~ , , — — — TT C e + 



(a 1 +b 1 -a 2 ) (a 2 +b 2 ) 

“l 1 



a l a 2 q 



( a 1 +b 1 -a 2 ) (a^+b^+b^) 



-( a l+ b l+b 2 )t 



(a 1 +b 1 +b 2 ) (a 1 +b 1 -a 2 ) (a^b^) 



f S l a 2 r 


(b^)? 


[ (b 1 + b 2 ) ( a 2 +b 2^ +a l b 2^ q | 


}(a 2+ b 2 )(a 1+ b 1+ b 2 ) 


( a l+ b l+ b 2 ) 


( a 2 +b 2^ ^ a l +b l +b 2^ J 



. (48) 



We note that x^ might be equal to -H», i.e. we never switch. Assuming 

that a switch in targets does occur, however, let us denote S (x=x^ ,1 ,1,1) 

by Sq where, as we recall, x^ is the smallest T which satisfies 

£ 

W(x=x^, 1,1,1) = 0. Then, we have that <j> g (x ,1 ,1 ,1) = 0 for x^ < x :£ x 2 » 
where x 2 denotes the "backwards time" of the second switch in the optimal 
fire distribution policy. Then, we have 

* 

for t 1 < x s: x 2 when a^ ;> a 2 q (<J> g (x ,1 ,1,1) - 0) 



S(x, 1,1,1) = 



a 2 ( b i r_a i p ) 

(a 2 +b 2 - a i) ( a 1 +b 1 ) 



~( a l+ b l)x (a 2 +b 2 ) (x 1 -x)-a 1 x 1 -b 1 x 
e - e 



a l a 2 r 



( a l+ b i) (a 2 + b 2 +b^) 



[ (b 1 +b 2 ) (a 1 + b 1 )+a 2 b 1 ]p (b 1 +b 2 )q 

( a l+ b i) (a 2 +b 2 +b 1 ) + (a 2 +b 2 +b 1 ) 



( a 2 +b 2 + b i) (x r x) 
e 



+ 



a i a 2 r 



( a l+ b i) (a 2 + b 2+b^) 



[ ( b i+b 2 ) ( a i+b 1 )+ a 2 b i]p (bj+ b 2 ) q | 

( a l+ b l) ( a 2+ b 2 + b i) ^ a 2 +b 2 +b lM 



(49) 



Again, we note that might be equal to -H», i.e* we might never redis- 

tribute fire a second time. Assuming that a second switch in fire distribu- 
te 

tion does occur, we have (x ,1,1,1) - 1 for < x ^ We have not 

* 

carried out the computation of S(x, 1,1,1) past x^* 
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For Case (2): a^p < a^q , the results are symmetric to the above 

(interchange the roles of and X^) and hence are omitted. 

Although the above constitutes a complete development for S(t, 1,1,1) 
A 

(and hence (f> s (x , 1 , 1 ,1) via W(t, 1,1,1)), these results are complex 

* 

enough that it is not immediately clear how <J>g (x ,1 ,1 ,1) changes over 
time and/or depends on model parameters.^ 

5.3. Development of Numerical Solution . 

With the advent of modern high-speed digital computers, finite 
difference methods of obtaining an approximate solution are commonly used 
when an analytic solution cannot be obtained to equations like (36) through 
(42). Euler’s method (see pp. 130-131 of [15]) yields the simplest finite 
difference approximation for (36) . Let us denote the approximation to the 

A 

optimal expected value function as S. We shall compute values for this 
approximation at discrete points in time separated by a constant amount 
Ax. We let x = &A t so that t^ = LAx. Then (36) may be approximated 
by 

for m^>0, m 2 > 0, n>0: 

S((JH-l)Ax,m 1 ,m 2 ,n) = { l-(Ax) (b 1 m 1 +b 2 m 2 ) }S (JlAx jm^ .m^ ,n) + 

/v A A 

(Ax) (b m +b m )S (£Ax ,m ,m ,n-l) + n(Ax) maximum[d> _a. {S (JlAx ,m -1 ,m ,n) 

1122 12 o4 c ^i s 1 1 

A /V /V A 

- S(iAx,m 1 ,m 2 ,n) ) + (l-^ g )a 2 {S (&Ax ,m 1 ,m 2 -l,n) - S (JlAx .m^ ,m 2 ,n) } ] , (50) 

for £ = 0,1 L-l with boundary condition (37) and also (38). Similar 

approximations may be developed for (41) and (42) . 

We recall that for the deterministic formulation when x (t^) > 0 and 
X 2^ t f^ > t * ie con ditions a ^P ^ a 2 < l an< ^ a l^i > a 2^2 that 

♦*(x,* ,x y) = 1 for the entire battle. 
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As noted above, consideration of (36) through (42) yields that 
there are restrictions on the order in which the optimal expected-value 

A 

function S (or its approximation S) is computed. The computation of 
S ( (£+1) At ,m^ , 11 * 2 >n) depends upon the quantities shown in Figure 1 below. 




Figure 1. Dependence of Optimal Expected-Value Function 
on Discrete State Variables. 

Based on the dependence depicted in Figure 1, the solution can be "built- 
up" as shown in Table IV. 

It remains to discuss the adequacy of the finite difference approxi- 
mation (50). It is well-known (see pp. 130-145 in [15]) that Euler’s 
method yields a finite difference approximation for such a system of 
differential equations that is both consistent and stable so that the 
approximate solution S can be guaranteed to converge to the exact analytic 
solution S as Ax 0 (and L ®) [28]. However, Ax must not be too 

large in order to keep the truncation error satisfactorily small. Moreover, 
the time step size At is also limited by the fact quantities like 
(Ax) (b^m^+b^m^) or a^nAT or a^nAT in (50) represent probabilities and 
hence must be less than one. In our computational work^ we have used a 

computer program has been written in FORTRAN for this purpose. 



OOUr0N3HN3HHN)N3HN3OOONiHN)OONi 
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Table IV. Admissable Order for Computing Optimal 
Expected-Value Functions. 



0 

1 

0 

0 

1 

0 

1 

1 



0 

0 

1 

0 

1 

1 

0 

1 

0 

2 

0 

1 

2 

2 

2 

1 

2 

0 

0 

0 

1 

2 

1 

2 

2 

1 

2 

0 

3 

0 



n 

0 

0 

0 

1 

0 

1 

1 

1 

0 

0 

2 

0 

0 

0 

1 

2 

2 

1 

2 

2 

1 

1 

2 

1 

2 

2 

2 

0 

0 

3 



3 

3 

1 

2 

3 

0 

0 

0 

0 

0 

3 

3 

1 

2 

3 

3 

3 

3 

3 

1 

2 

3 

1 

2 

1 

2 

3 

1 

1 

2 

3 

2 

3 

3 

4 
0 



m r 



1 

2 

3 

3 

3 

3 

3 

1 

2 

3 

0 

0 

0 

0 

0 

1 

2 

1 

2 

3 

3 

3 

3 

3 

1 

1 

1 

2 

3 

2 

3 

3 

2 

3 
0 

4 

etc . 



0 

0 

0 

0 

0 

1 

2 

3 

3 

3 

1 

2 

3 

3 

3 

1 

1 

2 

2 

1 

1 

1 

2 

2 

3 

3 

3 

3 

3 



3 

2 

3 

3 

3 

0 

0 



Note: Admissible order is top to bottom, starting with 

column (composed of m , m^, n) on left. 
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time step size which yields agreement in the fourth decimal place to the 
right of the decimal point when S is compared to the exact analytic solu- 
tion S in the special cases (such as (48) and (49)) when the latter has 
been obtained. 

6 . Comparison of Results from Deterministic and Stochastic Formulations . 

In this section we compare the structures of the optimal fire dis- 
tribution policy between the deterministic control problem (7) and the 
stochastic control problem (8). Before presenting this comparison, it 
seems appropriate to discuss some general methodological considerations. 

Any comparison between the two models should be guided be the purpose 
of the comparison. In the paper at hand our purpose is to consider whether 
the structure of the optimal fire distribution policy is the same for the 
two formulations. In other words, we would like to determine upon what 
groups of model parameters the optimal allocation rule depends and whether 
this depends upon the particular form of model adopted (here deterministic 
or stochastic). The' things that can be compared between the two models 
are (1) the optimal fire distribution policy and (2) the optimal (expected) 
return. It is the opinion of the authors that the second criterion (i.e. 
optimal return) is only significant when there are differences between the 
optimal policies from the two models. Furthermore, there are two types of 
comparisons that we can make between the models: one is quantitative and 

the other is qualitative. 

A direct quantitative comparison of the optimal policies^ obtained 
from the two formulations is impossible: on the one hand for the deterministic 

The only papers known to the authors in which a quantitative comparison 
between results for deterministic and stochastic optimal control problems 
is made are [48] and [49]. In both papers the state space is continuous in 
the stochastic problem. 
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model we have a piecewise differentiable battle trajectory, while on the 

other hand for the stochastic model we have a discontinuous Markov process 

* 

describing the force levels. Thus, we have 9 x^ y x^ 9 y) for the deter- 
ministic formulation with x^, x^, and y varying continuously over time, 

a 

and we have ^ ( t ,m^ ,m^ ,n) for the stochastic formulation with m^, m^, 

and n restricted to be non-negative integers and casualties occurring 

randomly as a Markov jump process. The impossibility of directly comparing 
* * 

♦ (t,x ,x ,y) and (t ,ra^ ,m^ ,n) continuously over time should be apparent. 

Nevertheless, we can still qualitatively compare the structures of 

A 

the two policies. There is, however, a difficulty in that $$ (t , m^ ,m^ ,n) 

represents a conditional policy, i.e. the optimal policy given that the 

system is in state (m^n^n) with n backwards time" t remaining in the 

battle. When a state transition occurs (randomly) to (m^m^n 1 ), then 

the optimal policy accordingly becomes (t ,m| ,m^ ,n f ) . In comparing the 

optimal policies this should be taken into account, since it does not seem 

appropriate to compare » m 2 ,n 0^ with m°, m°, and n^ held con- 

* 

stant to <j> D (i ,x^,x^ ,y) with x^, x^ , and y changing (continuously) 

over time. Since for the stochastic formulation it does not make sense 

to consider an "average" optimal policy or the optimal policy for "average" 

force levels, for comparison with the optimal policy for the deterministic 

formulation we have considered a realization of the stochastic attrition 

process in which the force levels are always "near to" those of the corre- 

* 

sponding deterministic process. In other words, we will compare (x ,x^ ,x^ ,y) 

* 

to ( T > m i » m 2 »n) at selected values of x , x^, and y. The force levels 

in the deterministic model are rounded to integers to yield the values of 
m^, m^, and n as follows: m^ = [x^] + 1 (and m^ = 0 when x^ = 0) 
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where [x] denotes "the greatest integer in x," i.e, [3.96] = 3.^ 
Moreover, in our comparison we will try to use the results obtained from 
the deterministic formulation to gain insight into the behavior of the 
optimal policy for the stochastic control problem. In other wo^o k we 
will try to explain results from the stochastic formulation 1/ considering 
the corresponding behavior for the deterministic formulation. 

Numerical results have been generated using two FORTRAN programs 
run on an IBM 360-67 computer. The program which generates 4»^(t ,x^ ,x^ ,y) 

(and also the force level trajectories) has been discussed in Section 4.5. 

* 

The program which generates <J> Q (t,m ,m ,n) performs the computations 
described in Section 5.3. The program for the stochastic formulation is 
limited by computer memory requirements. Results for all force levels are 
retained for two time steps. A battle with m° = 5, m° = 5, and n^ = 5 
requires 200,000 bytes of computer memory, and this increases exponentially 
with the force levels as Table IV indicates. Thus, most runs of the computer 
program for the stochastic formulation have been with the above as the 
upper limit for initial force levels, although we have run one case with 
m° = 9 , m° = 9, and n^ - 9 which required nearly 2,000,000 bytes of 
memory . 

The above computer programs have been run for over fifteen different 
"parameter sets," typical examples of which are shown in Table V. In all 
cases we have chosen parameter values so that a^b^ > a 2^2* ^e °P timal 
policies for the deterministic and the stochastic formulations have been 
compared as discussed above. The results of these comparisons will now be 
summarized . 

^This is done so that an interval process (time between casualties) of the 
casualty process will be "similar" in the deterministic and stochastic 
formulations . 
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Table V. Parameter Sets Used to Generate Numerical 
Results Given in Tables VI through VIII. 



Parameter 



Set 


a l 


-2 


*1 


—2 


£ 


£ 


r 


1 


0.025 


0.015 


0.035 


0.005 


0.75 


2.25 


2.0 


2 


0.005 


0.003 


0.007 


0.001 


0.15 


0.45 


0.4 


3 


0.085 


0.080 


0.03 


0.03 


1.0 


2.0 


2.0 


For all 


the above 


parameter 


' sets 


we have 


a l b l > 


a 2 b 2 


and 



The first thing to be pointed out is that the optimal fire distribu- 
te 

tion policy for both formulations has the property that <J> is either 

0 or 1 (almost everywhere in time) For the deterministic formulation, 

* 

we have shown [34] that a singular solution is impossible and that 
must be 0 or 1 except for at most one point in time. Although we have not 
proved such a result for the stochastic formulation, we have never encountered 
any exception to it in all our numerical computations. As we have discussed 
above, two cases must be distinguished: 

Case (1) 

Case (2) 



a l P * a 2 q> 



a l P < a 2 q ’ 



For Case (1): a^p ^ a 2 ^ > the optimal policy is apparently identical 

* * 

for both formulations: ( t ,x^ ,y) = <J>^(t ,m^ ,m^ ,n) = 1 for x^ > 0 

(or m^ > 0) . We recall that this result has been proved for the determi- 
nistic formulation. Although a proof has not been found, it apparently 
is also true for the stochastic formulation. No exception has been encoun- 
tered in all the cases for which numerical determinations have been made. 



See [36] for a discussion of why this is so and for an example of a similar 
problem with a different attrition process for which 4>* may take on an 
intermediate value, i.e. 0 < <J>* < 1 (see also [38]). 
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For Case (2): a^p < a^q, the optimal policies are similar but not 

identical. The basic structures are apparently essentially the same. As 

discussed above, the two policies have been compared at selected points 

along a deterministic trajectory by considering a corresponding realization 

of the stochastic process obtained by rounding the deterministic force 

levels. The time of such a comparison is rounded up to the next whole 

minute in the case of the occurrence of a casualty and to the next 0.01 

minute in the case of a switch in fire distribution. Cases corresponding 

to over ten parameter sets have been considered; illustrative examples of 

such parameter sets are shown in Table V. 

In Table VI we show some typical comparisons.- Although not shown 

in Table VI, it should be noted that in all the cases numerically computed 
* 

4>g (t ,m^ ,n) had the property that for constant m 2 , and n 

♦ s (x,m ,m ,n) = 0 for 0 ^ t < x 1 and (fr g (x ,m ,m 2 ,n) = 1 for > x 

where r denotes the "backwards time." In Table VI we show the optimal 

policies for the two formulations for two parameter sets. The optimal 

policies are given at discrete points in time following the above discussion. 

These times correspond to a switching time in one of the formulations or 

the occurrence of a casualty in the "typical" realization of the stochastic 

process. The deterministic force levels x^ , x^> and y from which 

m^ , m^, and n have been determined are not shown in Table VI. The 

optimal returns for the two formulations are also shown. 

The results shown in Table VI are typical and indicate (at least 

for all the cases so far computed) that there is no fundamental difference 

between the structures of the two optimal policies, at least where the 

deterministic battle does not terminate prematurely, i.e. t f *= t ^ 

r max 

^ Thus , these remarks apply to cases in which optimal deterministic trajectories 

lead to terminal states S- S~, and S 0 . 

1> 2 y 3 
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Table VI. Comparisons of Results from Deterministic 
and Stochastic Optimal Control Problems 
(Deterministic Trajectory Leads to Terminal State SI) 



Elapsed Time, t Force Levels 
(minutes) m, m 0 n 


♦ g(t 


Parameter Set 1 

Deterministic 

,x, ,x^,y) Optimal Return <J>* 


(t,m ,m„ ,n) 


S(t,m ,m„,n) 


X 

0 2 


5 


3 




X 

1 




-10.95 


x z 
1 


X z 
-8.93 


13 2 


5 


2 




1 




-10.95 


1 


-11.16 


18 1 


5 


2 




1 




-10.95 


1 


-9.12 


31 1 


5 


1 




1 




-10.95 


1 


-10.96 


35.39 1 


5 


1 




1 




-10.95 


0 


-10.79 


41.28 1 


5 


1 




0 




-10.95 


0 


-10.54 


50=t =t 1 


5 


1 




0 




-10.95 


0 


-10.00 


t max 


























Parameter Set 2 






Elapsed Time, t Force Levels 








Deterministic 






(minutes) m n 


. m 


n 




,x n ,x„,y) Optimal Return <J>* 


(t .m^ ,m^ ,n) 


S ( L a m ,m , n) 


i 

0 5 


l 

5 


5 


D 


X 

1 


1. 


-2.06 


1 


X l 

-0.62 


27 5 


5 


4 




1 




-2.06 


1 


-2.17 


50 4 


5 


4 




1 




-2.06 


1 


-1.67 


55 4 


5 


4 




1 




-2.06 


0 


-1.64 


56 4 


5 


3 




1 




-2.06 


0 


-2.06 


56.38 4 


5 


3 




0 




-2.06 


0 


-2.05 


87 4 


5 


2 




0 




-2.06 


0 


-2.18 


100“t =t 4 


5 


2 




0 




-2.06 


0 


-2.05 


f max 


























Parameter Set 2 






Elapsed 


Time , 


t 


Force Levels 








(minutes) 




m 


m o 


n 


<j>*(t,x ,x„,y) 


J (t ,m^ jin 0 , 


n) 


0 






1 

5 


z 

5 


5 


U X z 

1 


1 




5. 


61 




5 


5 


5 


1 


0 




6. 


38 




5 


5 


5 


0 


0 




26 






5 


5 


4 


0 


0 




50=t = 


t 




5 


5 


4 


0 


0 





max 
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The reader should note that $ changes somewhat earlier in forward time 

* 

from 1 to 0 than does <J>^ (at least for the realization of the stochastic 
process considered here). 

In cases in which the deterministic battle ends prematurely (i.e. 

the optimal trajectory leads to , S^, S^, or S^) more pronounced 

quantitative differences may occur. This is illustrated by the cases shown 

in Table VII. As noted above, the deterministic trajectory determines at 

* 

which values of m^, m^ , and t we look at <j>g. This should explain 

to the reader why the stochastic results shown in Table VII are not realizable. 

Thus, for the first battle shown in Table VII, a realization of the stochastic 

battle would evolve differently (in structure) than the deterministic battle 

due to this difference in the optimal controls. The authors feel that this 

is due to the fact that Y marginally wins the deterministic battle, and 

thus in the stochastic model there is a fairly good probability^ at t 

much less than t that Y will lose the battle. In other words, there 

max 

are some possible probabilistic trajectories which yield a reduced payoff 
to Y. These are weighted in the stochastic decision process, and Y con- 
sequently follows a more conservative policy for the stochastic formulation. 

For the case of the first battle shown in Table VII, Y essentially gives 
up his chances of winning to guarantee a given level of return. This 
phenomenon is similar to the "flypaper effect 11 noted by Whittle [48J in 
certain stochastic optimal control problems. In the second battle shown, 

Y achieves a clear-cut victory in the deterministic battle, and this 
phenomenon does not occur. 



II * 

A transition from (m_ ,m ,n) = (3,5,5) to (2,5,5) is impossible when <J> q ^ 0 

n 1 2 b 

This probability has not been explicitly determined. 
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Table VII. Comparisons of Results from Deterministic 
and Stochastic Optimal Control Problems 
(Deterministic Trajectory Leads to Terminal State S7) 



Elapsed Time, t 
(minutes) 

0 

3 

5 

6 

8.59 

11 

13 

18 

21 

24 

31 

40.1=t, 



Foret 

ra 


Parameter 


Set 


3 


* Levels 
n 




l x u 


i. 






u 




3 


5 


5 




1 


2 


5 


5 




1 


2 


5 


4 




1 


1 


5 


4 




1 


0 


5 


4 




0 


0 


5 


3 




0 


0 


4 


3 




0 


0 


3 


3 




0 


0 


3 


2 




0 


0 


2 


2 




0 


0 


1 


2 




0 


0 


0 


2 




0 




| Parameter 


Set 


3 



t =50 minutes 
max 






0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 



Elapsed Time, t 
(minutes) 

0 

3 

5.04 

8 

11 

14 

14.11-t* 



Force Levels 

n rf>* 



-= 1 - 

2 

1 

0 

0 

0 

0 

0 



—2 

3 

3 

3 

2 

1 

1 

0 



5 

5 

5 

5 

5 

4 

4 



4* 

1 

1 

0 

0 

0 

0 

0 



A 50* 

4s— 

1 

1 

0 

0 

0 

0 

0 



A 40* 

*S— 

1 

1 

o 

o 

o 

o 

o 



, 30* 

V- 

1 

0 

0 

0 

0 

0 

0 



20 * 

V- 

0 

0 

0 

0 

0 

0 

0 



,40* 



Note: <j> denotes (J>* (t ,m 1 ,m ,n) computed with t =40 minutes. 

j b i z max 
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In addition, in cases in which there is a premature termination in 
the deterministic formulation, the optimal policy for Y in the correspond- 
ing stochastic problem is affected by the length of the "perceived planning 
horizon." This effect is shown in the data for the second battle of Table 
VII in which optimal policies are given for stochastic battles of varying 
lengths. We see that when the deterministic battle ends near to the 
scheduled end of the stochastic battle, Y follows a more conservative 
policy in the stochastic battle. Since there is some chance that Y cannot 
annihilate the X forces in the "perceived length of battle," he follows 
a conservative policy of firing at This might, in fact, explain the 

results for the first battle. Other similar phenomena have been encountered 
in cases not shown here. 

Finally, in Table VIII we show that the optimal policy followed by 

Y in a realization of the stochastic combat process may differ appreciably 

from that for the deterministic formulation if the realization does not 

* 

"follow" the deterministic trajectory. It is seen that <j> s may repeatedly 
switch back and forth from 0 to 1 for certain realizations of the stochastic 
process. This is quite different than the corresponding behavior for the 
deterministic version. 

7 . Discussion . 

In this section we discuss what we have learned from the above com- 
parison. First and foremost, the authors feel that the deterministic 
formulation provides more insight into the structure of the optimal fire 
distribution policy. The explicit dependence of the optimal control upon 
various parameter groups (these are (1) R = a (a^b^) > (2) & = a^p/ ( a £ C l) > 
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Table VIII. One Possible Dependence of Optimal Stochastic 
Control on Realization of Casualties in 
Stochastic Lanchester Attrition Process 
(Deterministic Trajectory Leads to Terminal State S 7 ; See Table VII.) 



Parameter Set 3 | t^ = 50 minutes 



Elapsed Time, t 
(minutes) 



0 

0.5 

0.7 

10.0 

15.0 

20.0 

23.55 

24.0 

25.0 

26.0 

30.0 

35.0 



Force Levels 
m ^ m^ n 

3 5 5 

3 4 5 

3 3 5 

2 3 5 

2 3 4 

2 2 4 

2 2 4 

2 2 3 

2 13 

113 
112 
10 2 



^Ct,m 1 ,m 2l n) 

0 

0 

1 

1 

0 

1 

0 

0 

1 

1 

0 

1 
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r / 2 

and (3) a = — ^ — ) is readily obtained for the deterministic optimal, 

control problem. This has not been true for the stochastic problem for 
which only the dependence upon 6 has been analytically obtained. 

Let us now summarize the observed differences and similarities 
between the structures of the optimal policies for the deterministic and 
stochastic formulations. The similarities are: (1) optimal policy always 

0 or 1, (2) same parameter groups (R,6, and a) upon which optimal 
policy depends, (3) optimal policy dependent upon force levels and 

A 

whether Y wins or loses, (4) in both models <p =1 for x^ > 0 when 

* 

<5^1 and R > 1, and (5) 4> =0 for t 6 (T-x^,T] when 0 £ 6 < 1 < R; 
furthermore =* x^(a)^ The differences are: (1) in the stochastic 
formulation the optimal policy actually implemented (i.e. followed) in a 
battle depends upon the battle f s probabilistic (forward) evolution (i.e. 
the realization of the stochastic process) and the time remaining in the 
prescribed duration battle, and (2) x^ is "greater in the stochastic 
model" except for cases corresponding to premature termination in the 
deterministic battle. Overall, we feel that an understanding of the 
structure of an optimal policy is best developed by considering the 
deterministic version of such a combat problem. For problems too complex 
for analytic treatment, rules of thumb for approximating an optimal policy 
are probably best obtained from deterministic formulations. 



In [34] and [36] one can find further discussion of the structure of the 
optimal policy, including interpretation of such parameter groups. The reader 
may find the following interpretations useful for understanding the solution 
to the problem studied in the paper at hand. The quantity a b^ ma y be thought 
of as the rate of destroying X *s kill capability against i. It is a measure 
of strategic (long run) return. The quantity a p represents the rate of de- 
struction of X value by Y at the end of battle. Thus, it represents short 
run return. The quantity r/bT reflects the loss of Y value at the end of 
battle so that a measures the loss of Y value relative to that of at 

the end of battle. 

Moreover, x. depends upon m ,m. , and n in the stochastic optimal control 
problem. 
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Finally, we would like to point out that there is a circumstance 
under which the stochastic formulation is to be preferred over the deter- 
ministic one. This is, namely, when there is a small number (approximately 
three or under) of each combatant type. As noted above, obtaining a 
numerical approximate solution to the optimal stochastic control problem 
is limited to small numbers of combatants due to computer memory require- 
ments.^ In such cases, however, of 9mall numbers of combatants (and a 
stochastic attrition process) , the stochastic formulation as a Markov chain 
is to be preferred when the required computer resources are available for 
the obvious reason that the deterministic differential equation model 
cannot adequately describe the situation. This point made comparison of 
results from the two formulations difficult. 

8 . Implications for Defense Planners . 

The authors feel that the study of even the very simplest abstractions 
(idealizations) of tactical allocation structures as considered in this 
paper has yielded significant implications for defense planners and 
military operations analysts. First and foremost is the fact that study 
of such deterministic optimal control problems provides much more insight 
into the structure of optimal allocation policies than corresponding stochastic 
formulations. We feel that such deterministic formulations provide a better 
understanding of the effects of modelling assumptions on optimal military 
strategies derived from the mathematical models. This is, of course, 
essential for determining optimal (or near-optimal) solutions to real world 
problems that are far too complex to be solved by exact analytic methods. 

These grow exponentially as force levels increase because of the way in 
which a solution must be N built up. 1 ' See Figure 1 and Table IV for illus- 
trations of this point. 
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Moreover, one might apply general principles or rules of thumb developed 
from the study of such idealizations to higher resolution studies which, 
for example, might use computer simulation methods. 

The study of the deterministic optimal control problem (7) in this 
paper yields several significant results which should be kept in mind by 
practitioners who perform more detailed computer simulation studies. 

These are 

(1) Force levels do affect optimal strategies. Whether one "wins 11 or 
n loses M affects optimal strategies. 

(2) Even the nature of the scenario (terminal control or prescribed dura- 
tion conflict) may affect optimal strategies. This, if one develops 
"good” tactics for a 90 day compaign, such tactics need not be "good" 
if the conflict does not terminate at the prescribed time. 

(3) The nature of the attrition process has a significant effect upon 
optimal strategies 

Finally, the authors feel that the above results indicate that more 
basic research should be done on the termination of battles and wars^ as 
well as combat attrition theories. The demonstrated sensitivity of results 
obtained from optimization problems like the one considered here shows 
this . 



^This result has been pointed out elsewhere [36], [38] and is partially 
based on the study of a similar problem [38]. 

Some work has been done in this direction [14], [33], [46], [47], although it 
does not appear to be widely known among practicing analysts. 
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APPENDIX. Explanation of Notation . 



The symbols which are used in this paper are defined as follows: 



a^,a 2 >b^,b 2 = constant attrition-rate coefficients, 

A(m,n) ,B (m,n) = attrition rates of X and Y forces, respectively, in 
stochastic battle; it should be noted that 



n , f one X casualty in time 1 A/ 

Prob • _ \ I ** A(m,n)At , 

^interval from t to t + AtJ * 



E [•] = conditional expectation (mathematical expectation of quantity 
m > t 

/V 



in brackets at t = 0 given that at t we have m(x) - 
(m 1 ( T ) ,m 2 ( T ) ,n(i) ) ) > 

H = Hamiltonian function, 



( t) ,M 2 (t) ,N( t) = the numbers (a random variable) of X^, and Y 

combatants, respectively, at time t, 



m l ,m 2 ,n * rea 13- zat i° ns of the random variables M^(t), M^Ct), and N(t); 
initial values denoted as m° , m°, n^, 

p,q,r = utilities assigned to surviving X^ , X^ and Y forces 
respectively, 

p.(t) for i = 1,2,3, = dual variable corresponding to x.(t) 

(x 3 (t) = y (t)) , 1 



P, = (P 1 »P 2 »P 3 ) (a vector), 

P(t,m,n) = Prob[M(t)=m,N(t)=n] = state probability, 

P° = (x°,x°,y 0 ) = point in the initial state space, 

R = a 1 b 1 /(a 2 b 2 ) , 

S(x,m^,m 2 ,n) = optimal expected value function, 

S = numerical approximation to S (t , m^,m 2> n) , 

th 

S. for i = 1 8 = the i — part of the terminal surface as defined 

in Table I. 



51 



/ ° °\ L ° , , ° 

s = s(x 1 ,x 2 ) * b 1 x 1 + b 2 x 2> 
t = time after beginning of battle, 



t^ = time at which is annihilated, i.e. x^(t^) = *-*> 

t„ = first time at which 2b x (t»)x° + b ? (x°) 2 = a„y 2 (t ) for an 

extremal leading to S., 

0 

t„ = last time at which fire is directed at X- for an extremal leading 
to S 3 , 

t^ = time at which X 2 is annihilated (before X^) , i.e. x 2 (t^) = 0, 
for an extremal leading to Sg, 

t^ = time at which battle ends, 

t = maximum possible duration for battle, i.e. t- £ t , 
max f max 



v = v (x) = a 2 P 2 (r) - a^Cx), 



W(x,m^,m 2> n) = "switching function" defined by equation (44), 

o o 

x ^> x 2»y = avera £ e force strengths; with initial values x^,x 2 ,yQ, 



z = cosh/a 2 b 2 x (S^) 

r.S 

V a„ ’ 



a = — 



R-6 
R-l ’ 



6 = a;L p/(a 2 q) , 

n i (t) for i = 1,2, = multiplier corresponding to state variable 

inequality constraint x^ ^ 0, 

v. for i = 1,2, = multiplier corresponding to state variable terminal 
1 inequality constraint x^(T) ^ 0, 

<J> (<j> ) = fraction of Y-fire directed at in deterministic (stochastic) 

formulation; extremal and optimal controls denoted as ^ C4> g ) * 
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{ 0 , 



_1 2 _ 

n(t) *n(t) * 



(t)-l 
’ n(t) ’ 



= set of admissible controls 
problem. 



in stochastic 



t = "backwards time" from the end of battle defined by t = t^ - t, i.e. 
the time remaining before the end of battle. 



t (S.) = "backwards time" of the first switch in tactics for extremals 
1 leading to S^. 



Additionally, remarks similar to those for t^(S^) above apply to 
ti( S i), etc. 
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